arXiv:1501.07844vl [cs.CV] 30 Jan 2015 


A Proximal Bregman Projection Approach to Continuous 
Max-Flow Problems Using Entropic Distances 

John S.H. Baxter“’^, Martin Rajchl“’^, Jing Yuan®’^, and Terry M. Peters“’^ 

“Robarts Research Institute, London, Ontario, Canada; 

^Western University, London, Ontario, Canada 

ABSTRACT 

One issue limiting the adaption of large-scale multi-region segmentation is the sometimes prohibitive memory 
requirements. This is especially troubling considering advances in massively parallel computing and commercial 
graphics processing units because of their already limited memory compared to the current random access memory 
used in more traditional computation. To address this issue in the field of continuous max-flow segmentation, we 
have developed a pseudo-flow framework using the theory of Bregman proximal projections and entropic distances 
which implicitly represents flow variables between labels and designated source and sink nodes. This reduces 
the memory requirements for max-flow segmentation by approximately 20% for Potts models and approximately 
30% for hierarchical max-flow (HMF) and directed acyclic graph max-flow (DAGMF) models. This represents 
a great improvement in the state-of-the-art in max-flow segmentation, allowing for much larger problems to be 
addressed and accelerated using commercially available graphics processing hardware. 
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1. INTRODUCTION 

Multi-region segmentation involves the partitioning of an image into multiple disjoint regions, each of which 
competes for space in the image. The rationale behind such a partitioning in an optimization-based manner is 
wide and multifaceted. In medical imaging, this approach can be used for segmentation and the identification of 
anatomy of interest from medical datasets. In computer vision, these approaches can be used for decomposing 
a scene into multiple regionJill^ or reconstructing images heavily polluted with noise.^^ 

Binary graph-cut^ were arguably the first instantiation of this approach, but have been necessarily limited, 
with no inherent extensibility. The binary model in a continuous space has been addressed by Yuan et al.,1^ 
specifically, the solution to the equation: 

E{u) = j{Ds{x)u{x) Dt{x){l — u{x)) S'(a:)|VM(a:)|)dx 

n 

s.t. u{x) G {0,1} . 

The technique used in the continuous max-flow model made use of augmented Lagrangian multipliers applied 
to the convex relaxation of the above equation. The solution to the relaxed form could then be thresholded, 
yielding the optimal solution to the case of discrete label values. 

The simplest model reflecting this partition is the Potts model.l^ This model has been largely explored by 
the computer vision community, yielding a proof of the NP-hardness of such a problem and the proliferation 
of methods for quickly approximating optimal solutions.^ Yuan et al.^ have also addressed the corresponding 
continuous optimization problem: 

E{u) = ^ f {Dl{x)ul{x) S{x)\VuL{x)\)dx 

Vi n 

s.t. ulIx) > 0 and ''^ul{x) = I . 

VL 
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However, the Potts model is limited in that it does not allow for regions to take on variable smoothness specific 
to their own requirements, or to collect into super-objects expressing more complex anatomy. 

The first movement towards incorporating structure and variable regularization was the discrete Ishikawa 
modeP which required the objects being segmented to be placed in a linear order. This linear order is interpreted 
as a sub-set relationship, that is, each object is a subset of its ‘parent’ object in the ordering, and a superset of 
its ‘child’ object. Bae et al.^ extended the work on the continuous binary min-cut problem to the continuous 
Ishikawa model: 


N 

E{u) = ^ / {Dl{x)ul{x) + S{x)\VuLix)\)dx 
^=0n 

s.t. ul{x) G {0,1} and ul+i{x) < ul(x) 


using similar variational methods but a tiered continuous graph analogous to that used by IshikaweP in the 
discrete case, that is, with finite capacities on intermediate flows between labels. 

The Potts and Ishikawa models allowed for immediate extensibility to multiple objects or regions, but did 
not extend this to a flexibility in how those objects could be arranged. Such flexibility is highly desirable in 
segmentation problems in which complex anatomy could be modeled a complex of simpler anatomical objects 
under joint regularization to encourage adjacency. This has explored in the discrete domain by Delong et al.l^li^ 
An early approach to this in the continuous domain was the limited hierarchical model proposed by Rajchl 
et al.l^ for myocardial scar segmentation. This hierarchical approach allowed for the myocardium and scar 
to be given vastly different regularization parameters than the background, necessary due to their elongated 
structure, without necessitating a linear order over the objects. The Hierarchical Max-Flow (HMF) method 
in the continuous domain was formalized and generalized by Baxter et al.,^ allowing for run-time selection of 
hierarchies and minimizing development time in the use of hierarchical max-flow segmentation methods. 


Maintaining sub-modularity for a broader selection of models has been a focus in the discrete graph cuts 
community due to the optimizability via graph cuts, and Delong et al.^^ have created a framework for multi-region 
segmentation with more general part-whole relationships being expressible. Even more recently, the continuous 
max-flow approach has been further generalized in Directed Acyclic Graphical Max-Flow (DAGMF) in which any 
set of objects can have a regularization term and these terms simultaneously applied.^ Ultimately, this family 
of models is the most general possible, exhausting the class of models using only the union, intersection, subset, 
and superset operators from set theory. (However, more models could be thought of with different constraints 
and considerations above simply having multiple explicit part/whole relationships.) 


Focusing on continuous max-flow models, one issue with complex models is the memory requirements. In 


the traditional full-flow 


label variable and flow variable are explicitly represented in 


memory, leading to a large memory footprint, especially for three-dimensional datasets. The benefit of such an 
approach is, by the theory of Lagrangian multipliers, the constraints placed on the label variables were implicit, 
and the algorithm had guaranteed convergence to a feasible solution. This did however imply that intermediate 
steps were not necessarily feasible, requiring some correction to the labels to be applied in post-processing, often 
lost in the process of discretization. 


2. CONTRIBUTIONS 

The primary contribution of this paper is to consolidate the theory of Bregman proximal projection^i^ and 
entropic distance metrics to continuous max-flow segmentation with multiple regions. This will first contain an 
algorithm and corresponding proof of correctness for the Potts model (with variable smoothness parameters). 
This will then be extended to the Ishikawa, HMF, and DAGMF models demonstrating the general applicability 
of this method to max-flow segmentation under any label ordering structure. 
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3. CONVEX RELAXED HIERARCHICAL MODELS AND PREVIOUS WORK 


3.1 Previous Work 

Traditionally, continuous max-flow segmentation models have involved the explicit representation of every flow 
variable in memory, leading to larger memory consumption. Recently, this has been addressed for binary models 
by Bae et al^ in two stages. The first implicitly represents the source flow noting that under the flow conser¬ 
vation constraint, it could be expressed analytically in terms of the sink flow and divergence of the spatial flow. 
The second method, which we will be extending in this paper, was called the pseudo-flow representation and 
implicitly represented both source and sink flows, requiring only the spatial flows and labels to be stored. In this 
case, the six buffers required for the full-flow version is reduced to only four buffers, a significant improvement 
in efficiency of memory use. 

3.2 Pseudo-Flow Approach to the Potts Model 

We will proceed in developing a pseudo-flow model following the approach taken by Bae et al.^ starting with 
the continuous Potts model. 

Yuan et al.l^ have shown the equivalence between the convex relaxed continuous Potts model: 

min^ / {DL{x)uLix)S{ x)\Vul{x)\) dx 

u(x) f 

v-c- n (1) 

s.t. \/LuLix) > 0 and Vx, '^^ul{x) = 1 

VL 


to the primal max-flow model: 


max 

Ps(x),q(x),p(x) 


Ps{x)dx 


n 

s.t. ^{L,x), pl{x) < Dl{x) 

VL, \qL{x)\ < Sl{x) 

VL, divqL{x) - psix) +PL(a:) = 0 


( 2 ) 


through the primal dual model: 


min max 

u{x) ps(x),q(x),p(x) 


Ps(x) + i: ul(x) (div qL(x) 


VL 


s.t. VL, pl(x) < Dl{x) 
VL, \qL{x)\ < Sl{x) 


ps{x) -{-pl{x)) dx 


( 3 ) 


noting that in both the primal and the primal-dual formulations, the labeling constraints ul{x) > 0 and 
~ 1 become implicit. 

Firstly, we need to show that the formulation can be addressed by developing the appropriate non-smooth 
pseudo-flow model: 

max / min (L>i(x)-I-div mix)) dx (4) 

\qL(x)\<SL{x)J L ' ' ' 

n 
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which can be derived from equation ([3]) as shown in the following steps: 

min max / ^5(0;) + (div 17^(0:) - ps(a:) + da; 

u(x)ps(x),q{x),p(x)J^ V ^ / 

s.t. V(L,a;), pl(x) < Dl(x), \qL{x)\ < S'l(x) 

= min max / ps(x) - ^ ml(x)ps(x) + ^ wl(x) (div ^^(x) + pl(x)) dx 

u{x) psix),q{x),p(x) ^ y J 

S.t. V(L,x), Pz,(a:) < Dl{x), |gL(x)| < S'l(x) 

= min max / > atr (x) (div or (x) + pr(x)) dx 

u(x) q{x),p{x) J ^ 

s.t. ^ul(x) = 1 , VL, pl(x) < L>l(x), |gL(x)| < S'l(x) 

(as the equivalence of equations © and @ guarantees the constraint ~ ^ ) 

= min max E ul (x) (div (JL (x) + Dl (x)) dx 

'^(^) <?(^) J wr 

o 

s.t. ^ul(x) = 1 , VL, ml(x) > 0 , \qL{x)\ < ^^(x) 


VL 


(as the equivalence of equations © and @ guarantees the constraint VL, ul{x) = 1 
and maxp^(a,)<£,^( 3 ,) ul(x)pl(x) = Dl{x) if ml(x) is non-negative) 

= max min [ Vul(x) {div q^x) + Dl{x)) dx 

q{x) u{x) J 


VL 


S.t. ^■ul(x) = 1, VL, ul(x) > 0, \qL{x)\ < Sl{x) 

VL 

= max / min (L)l(x)-I- divfli(x)) dx 


(since min„(,j,) X)vl ^L(a;) (Dl(x) -I- div(3'L(x)) = minL(DL(x) -|- divqL(x)) when 
SvL «L(a;) = 1 and VL, ul(x) > 0) 

Now that a pseudo-flow representation is developed, we can take advantage of Bregman proximal projectionJi^ 
to optimize this formula. Consider the distance between vector-valued labeling functions u(x) and v{x) as: 


dg{u,v) = / I ULjx) \n{uLix)/vLix)) - ul{x) + vl{x) j dx 

n VvL / 

which can be verified to be a Bregman distance (when VL, ul{x) € [0,1]) using the entropy function: 

g{u) = J ^2 i'^L^x) InuL(x) - ul{x)) dx . 


(5) 


If we consider a feasible labeling, u(x), we can find another proximal labeling, u{x), which has a lower energy 
by addressing the optimization: 


m(x) = argmin 

ul{x)>0, J2vL XtL(x) = 


max / ul(x) {Dl{x) -\- diY qLix)) dx + cdniu.v) 
l\\qL{x)\<SL{x)J 


( 6 ) 
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where c is a positive constant. Using a Lagrangian multiplier on the constraint 
u{x) analytically as: 


ul{x) 


vl 


(x) exp 


Dz(x)+div gL(x) 


Evl' VL'(x)exp 


D^,(x)+div qLi(,x) 


ul{x) = 1 , we can solve for 


(7) 


noting that this answer fulfills the constraint ul{x) > 0 provided the same holds for v(x). By letting the distance 
weighting parameter c approach 0, ul{x) ^ 0 if Dl{x) + divqL{x) ^ Tahiti {D {x) + A\YqLi{x)). Using that 
fact, 


mm 

ul{x)>0, J2vl ul(x) = 1 


max 

191,(0;) I < Si, (x) 


J ul{x) {Dl{x) + div 9 L(a;)) dx + cdg{u,v) 


—>■ max 

|9I,(a;)|<Si,(a:) 


/ 


nun (£)L(a:) + divgL(a:)) as c —>• 0 


illustrating that the Bregman proximal method acts just as well as a smoothed version of the non-smooth 
pseudo-flow representation, equation (| 3 ]). 

By taking the gradient of equation ([S]) with respect to div 59 ,( 0 ;) (with ul{x) substituted by equation (fl31) l 
one can derive the appropriate Chambolle iteration schem^^^ for maximizing ([S]) with respect to the spatial flow 
variables q{x): 

qEx) ^ Proj|,^(,^)|<s^(,^) {qL - ctV {vLexp + div 5 l( x) ^ ^ ^ 


where r is a positive gradient descent parameter. 

This yields the algorithm for optimizing the continuous Potts model using repeated Bregman proximal pro¬ 
jections and Chambolle iterations: 


VL, ul{x) t- 1/#L; 
while not converged do 

VL, ■ul(x) <- ul{x) exp ■ 

VL, qL{x) ^ Proj|,^(^)|<s^(,,) {qL^x) - ctVul{x)) ; 
a{x) ^ Eyh’^Lix); 

VL, ULix) UL{x)/a{x)] 

end 


Algorithm 1: Pseudo-Flow Algorithm: Continuous Potts Model 


3.3 Pseudo-Flow Approach to Directed Acyclic Graphical Max-Flow 

For the sake of generalization, we shall begin developing a pseudo-flow model for DAGMF first, applying it to 
the special cases of HMF and Ishikawa models. As with the Potts model, note that Baxter et al.E^ proved the 
equivalence of the DAGMF model: 

minY^ / {Dl{x)ul{x) + SL{x)\VuL{x)\)dx 

n{x) ^ 

s.t. \IL{ul{,x) > 0 ) 

/ \ (9) 

VL I Ul{x) = ^ W(^L^L,)UL'ix) j 

V L'&L.C / 

usix) = 1 
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to the primal model: 


ps{x)dx 


max 

p{x),q{x) 

n 

s.t. pl{x) < Dl{x), where L.C = 0 
\qL{x)\ < Sl{x) L ^ S 

div qL{x)+ pl{x) - ^ W(L',L)PL'(a:) = 0 

L'SL.P 


/ 


through the primal-dual model: 


min max / psix) div qL{x) -Hpl(x) - V W(l',l)Pl'{x) dx 

u(x) p(x),q(x) } \ I 

\ ) y\ )j y L'GL.P / 


s.t. pl{x) < Dl{x), where L.C = ^ 
\qL{x)\ < Sl{x) S . 


(10) 


( 11 ) 


As with before, we will develop a non-smooth pseudo-flow representation from the primal-dual representation 
m- To do so, we will develop some notation, specifically: 

L = {L|L.1:7 = 0}, 


and 


dpix) 


0, if L = S' 

Dl{x) + div qL^x) + D'^L'ix), if L e L 

divgL(a;) + Y.L'&L.p^(L'.L)dL’{x), else 


pGpath(Al.,B) {M,N)Gp 

[ T,VLgA.c'^(A,L)W(^L,B), 


it A = B 

if B is not a descendant of A, i.e. B ^ A.C* 
else 


dL^x) where L G L represents the flow excess of an end-label, L, taking into account the spatial flows for super¬ 
objects containing L. Wi^a.b) where i? G L represents the amount of weight assigned to a super-object based on 
a single component, the end-label B. These definitions mirror the top-down process of flow excess accumulation 
and the bottom-up process of label accumulation in the original DAGMF algorithirfi^ and will play such a role 
in the pseudo-flow formulation. L gives a convenient notation for end-labels. 


As with the previous method, we will develop a pseudo-flow representation which is amenable to solving 
through a mixture of Bregman proximal projections and Chambolle iterations. The entropic distance metric 
used in this formulation is: 


dg(u,v) = / I ul{x)\b.(ul{x)/vl{x)) - ul{x)+vl{x)\ dx (12) 

Q Vvlgl / 

which can be verified to be a Bregman distance (when \IL,ul{x) G [0,1]) using the entropy function: 

g{u) = / ^ {uL(x)\nuL{x) - ul{x)) dx . 
i Viet 

This may at first not look like a true distance, since if there were no constraints on the values of ul{x) where 
L ^ L it would only be a pseudo-metric. However, it must be noted that since these are constrained, any two 
labelings in which the values are equal for end-labels must also have equal values for non-end-labels. Since 
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dg(u,v) = 0 only if u{x) and v{x) have the same values for end-labels, dg{u,v) = 0 also implies they have the 
sames for non-end-labels and thus is a valid distance. 

Consider the non-smooth pseudo-flow formulation: 


max 

\qL{x)\<SL(x) 


/ 


min (dLix)) dx 
lgl 


(13) 


which can be derived from equation (HU as follows: 


min max / ps{x)y^ ul{x) div qL{x) + pl{x) - 'V] W(l',l)Pl'{x) dx 

uix) p{x),q{x) / \ I 

n \ L'^L.P / 

s.t. VL G L, pl{x) < Dl{x) 

VL, \qL{x)\ < Sl{x) L^S 

— min max J E ul{x) div qLix) + ^ ul{x)pl{x) dx 
u(x)p(x),g(x)7 J 

S.t. VL G L, pLix) < Dl{x) 

VL, \qL{x)\ <SLix)L^S 

usix) = l,yL ul{x) = ^ W(l,l')Ul'{x) 

L'GL.C 

(which is guaranteed by the equivalence of the DAGMF formulation and the primal-dual 
representation) 

= min max E ul{x) div qL{x) -b ^ ul{x)Dl{x) dx 

U{x) q{x) J \ / 

s.t. VL, \qL{x)\ < Sl{x) L^S 
ul{x) = 1 , VL, ul{x) > 0 

vlgl 

(which is guaranteed by the equivalence of the DAGMF formulation and the primal-dual 
representation) 

= min max [ V UL{x)dL{x)dx 

I VLGL 

S.t. VL, \qLix)\ < Sl{x) L^S 
ulIx) = 1 , VL, UL^x) > 0 

VLeL 

= maxmin f V uUx)dUx)dx 
q{x) uix) 7 

S.t. VL, \qL{x)\ < Sl{x) L^S 

y^ ul{x) = 1 , VL, ul{x) > 0 
vlgl 

= max / TcdnldLix)) dx . 

\qLix)\<SLix)J 

n 

As with earlier, we can combine this representation with the entropic distance to yield an improved labeling 
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u{x) proximal to v(x) via: 


M(a:) =argminmax [ V UL{x)dL{x)dx + cdg{u,v) 

u(x) 9 (^) J 

s.t. VL, \qLix)\ < SUx) L ^ S 
ML(a;) = 1, VL, ■ul(x) > 0 . 


(14) 


VieL 


This is equivalent to that used for the Potts model (jH]). Therefore we can solve for u{x) for all end-labels L G L: 

?;L(a:)exp 


wl(x) = 


EvL'eL VL'{x) exp 


(15) 


and propagate these labels upwards to get the values of ul{x) where L ^ h using the labeling constraints. A 
corollary to this is that this label update equation approaches the pseudo-flow representation as c —>■ 0. 

Lastly, we must consider updating the spatial flows, once again by finding the gradient of equation (ITT)) with 
respect to divgi(x). Doing so yields the update equation with positive gradient descent parameter r: 


qLix) 


P™j|9i,(DI<Si.(D (ql{x)-ctV (vL{x)exp (-^^))) , 

{qL{x)-CTV (j2L,^i^W(^L,L')VL'ix)exp(^-^ 


(x) 


if L G L 
else 


(16) 


All together, this framework yields the following algorithm: 


VLgL, wl(x)^1/|L|; 
while not converged do 

VL, dL{x) ■(- divgL(x); 

VL G L, dhix) ■<— dL{x) -I- Dl{x); 
for L in order O/j^} do 

for L' G L.C do 

I dL'{x) ^ dL'(x) + W(^L,L')dLix)] 

end 

end 

VL G L, ul{x) ^ ul{x) exp ; 

VL G L, dL{x) ■<— ul{x); 

a{x) ^ T,vLei.'^L{x); 

VL G L, ul(x) ul(x) /a{x)] 

VL ^ L, dL{x) ■<— 0; 

for L in order do 

qrix) ^ Proj|,^(,^)|<s^(,^) {qrix) - crVdLix)) ; 
for L' G L.P/{S] do 

I dL'{x) -1^ du +W(L>^L)dL{x)\ 

end 

end 

end 

Algorithm 2: Pseudo-Flow Algorithm: Directed Acyclic Graph Max-Flow 


where O is the top-down topological ordering starting with S and ending with the labels in L and is 
the inverse thereof. Note that the labels for intermediate nodes have also become implicit, further minimizing 
memory load. Note that this algorithm uses the same buffer for holding the flow excess d^ propogating top-down 











and the spatial flow fields for updating the spatial flow buffers propagating up, further reducing the number of 
buffers required. 

This algorithm requires 5 buffers per end-label, 4 buffers per non-end-label not including the source S', and 1 
additional buffer for accumulation purposes. This is a large improvement over the full-flow approach by Baxter 
et al.^ which requires 6 buffers per end-label, 7 buffers per non-end-label not including the source S, and 2 
additional buffer related to the source node. 

3.4 Application of Pseudo-Flow to Ishikawa and other Hierarchical Models 

It should be noted that the HMF and Ishikawa models are special cases of the DAGMF model, meaning that the 
same algorithm could be immediately applied to them. In the case of HMF, the DAGMF algorithm is equivalent 
to the algorithm yielded by repeating the process of developing a pseudo-flow representation and solving it using 
the aforementioned steps. In this case, it may be somewhat simplified as the orderings O and correspond 
to simple bottom-up and top-down tree traversals, which can be readily implemented using recursion. 

The Ishikawa model is more interesting in that it’s formulation renders many of the flow variables redundant, 
and a separate instantiation of the pseudo-flow algorithm would be warranted. Modifications to the DAGMF 
pseudo-flow algorithm for an Ishikawa model with N levels and therefore iV -|- 1 labels, yield: 


VTi, i G 0..N, ULi{x) ■<— 1/ (iV -|- 1); 
while not converged do 
dLoix) G- 0 ; 

for Li in order i G [I..A^] do 

I dL^{x) dLi_iix) + diY qLi{x); 

end 

VLi, di, (x) ^ dLiix) + DlG, 

yLi, ULi (x) ^ ULi (x) exp ; 

VL^, dLi(x) G- ULi(x); 

a(x) G- Evl,'“D( a’); 

VL^, ULi{x) G- ULi(x)/a(x); 
for Li in order i G [A — 1..1] do 

I dLi{x) ^ dLiix) + dLi+iix)'; 

end 

VLi,i G I..A, (7l,(x) ^ Projj,^j„)|<s^.(„) (qrAx) - cTVdL,(x)) ; 

end 

Algorithm 3: Pseudo-Flow Algorithm: Continuous Ishikawa Model 

This algorithm involves A -|- I buffers for the labelings, I for an accumulator, A for flow excess values (noting 
that dLo(x) is largely a mathematical formality and always takes on a value of 0 during propagation or Dl^(x) 
during label updates) and 3A for spatial flows. In total, this yields 5A -|- 2 buffers which is an improvement over 
the 6A -I- I buffers required to efficiently implement the algorithm Bae et al.^^ 

4. DISCUSSION AND CONCLUSIONS 

In this paper, we have presented a framework for the incorporation of entropic distance and Bregman proximal 
projections into the optimization of the Potts model. This approach can be extended to the case of Ishikawa 
models, hierarchical models, and general set-theoretic models with minimal additional theoretical development, 
demonstrating its general applicability in continuous max-flow segmentation. 

The pseudo-flow approach has the theoretical advantage over the full-flow model in that it constrains in¬ 
termediate segmentations to be feasible allowing for meaningful truncation. It has the practical advantage of 
requiring a significantly smaller memory footprint that makes it more amenable to implementation on massively 
parallel computing graphics hardware where memory is available in as large a quantity. 
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